SpatialDataSpatialData 0.99.17
library(ggplot2)
library(patchwork)
library(ggnewscale)
library(SpatialData)
library(SingleCellExperiment)
The SpatialData package contains a set of reader and plotting functions for
spatial omics data stored as SpatialData
.zarr files that follow OME-NGFF specs.
Each SpatialData object is composed of five layers: images, labels, shapes,
points, and tables. Each layer may contain an arbitrary number of elements.
Images and labels are represented as ZarrArrays (Rarr).
Points and shapes are represented as arrow objects linked
to an on-disk .parquet file. As such, all data are represented out of memory.
Element annotation as well as cross-layer summarizations (e.g., count matrices) are represented as SingleCellExperiment as tables.
For demonstration, we read in a toy dataset that is available through the package:
x <- file.path("extdata", "blobs.zarr")
x <- system.file(x, package="SpatialData")
(x <- readSpatialData(x, anndataR=TRUE))
## class: SpatialData
## - images(2):
## - blobs_image (3,64,64)
## - blobs_multiscale_image (3,64,64)
## - labels(2):
## - blobs_labels (64,64)
## - blobs_multiscale_labels (64,64)
## - points(1):
## - blobs_points (200)
## - shapes(3):
## - blobs_circles (5,circle)
## - blobs_multipolygons (2,polygon)
## - blobs_polygons (5,polygon)
## - tables(1):
## - table (3,10)
## coordinate systems:
## - global(8): blobs_image blobs_multiscale_image ... blobs_polygons
## blobs_points
## - scale(1): blobs_labels
## - translation(1): blobs_labels
## - affine(1): blobs_labels
## - sequence(1): blobs_labels
SpatialData object behave like a list, thereby supporting flexible accession,
e.g., via $ and [[, using character or integer indices etc. Specifically,
image/label/shape/point/table() retrieve one elementimages/labels/shapes/points/tables() retrieve one layerimage/label/shape/point/tableNames() retrieve element namesLet’s demonstrate these capabilities using the image layer:
shapeNames(x) # there are two images
shapes(x) # this is a list of them
shape(x, i=1) # this is the 1st one
shape(x, "blobs_polygons") # this is the 3nd one
x$shapes$blobs_circles # list-like handling also works
Each element is made up of two key components:
data: the actual array or table (for images/labels and shapes/points, respectively)meta: list of metadata retrieved from an OME-NGFF compliant .zattrs (stored as Zattrs)data(shape(x))
## FileSystemDataset with 1 Parquet file
## 2 columns
## geometry: geoarrow.wkb <crs: unspecified>
## radius: int64
##
## See $metadata for additional Schema metadata
meta(shape(x))
## An object of class "Zattrs"
## [[1]]
## [1] "x" "y"
##
## [[2]]
## input.axes input.name output.axes output.name type
## 1 c("x", ".... xy c("x", ".... global identity
##
## [[3]]
## [1] "ngff:shapes"
##
## [[4]]
## [[4]]$version
## [1] "0.2"
Currently supported methods for the tables layers include:
getTable() to retrieve the table annotating a given elementsetTable() to construct a new annotation for a given elementvalTable() to retrieve specific values from a table# retrieve 'table' for an element
(t <- getTable(x, i <- "blobs_labels"))
## class: SingleCellExperiment
## dim: 3 10
## metadata(0):
## assays(1): X
## rownames(3): channel_0_sum channel_1_sum channel_2_sum
## rowData names(0):
## colnames(10): 3 4 ... 15 16
## colData names(2): instance_id region
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(colData(t))
## DataFrame with 6 rows and 2 columns
## instance_id region
## <integer> <factor>
## 3 3 blobs_labels
## 4 4 blobs_labels
## 5 5 blobs_labels
## 8 8 blobs_labels
## 10 10 blobs_labels
## 11 11 blobs_labels
meta(t)
## $instance_key
## [1] "instance_id"
##
## $region
## [1] "blobs_labels"
##
## $region_key
## [1] "region"
# get values from an element's 'table'
valTable(x, i, "channel_0_sum")
## 3 4 5 8 10 11 12
## 25.0537079 10.6787960 3.2422082 3.4709382 7.7414086 0.9912003 40.3020305
## 13 15 16
## 0.2252217 1.2125781 7.0430209
# add an artificial 'table' to an element
i <- "blobs_points"
# ...passing a preconstructed 'data.frame'
n <- length(point(x, i))
df <- data.frame(n=runif(n))
y <- setTable(x, i, df)
head(colData(getTable(y, i)))
## DataFrame with 6 rows and 3 columns
## rk instance_id n
## <character> <integer> <numeric>
## 1 blobs_points 9 0.55876741
## 2 blobs_points 7 0.02619383
## 3 blobs_points 3 0.72022483
## 4 blobs_points 9 0.47306559
## 5 blobs_points 4 0.00516161
## 6 blobs_points 5 0.71987193
# ...using a list of data generating functions
f <- list(
numbers=\(n) runif(n),
letters=\(n) sample(letters, n, TRUE))
args <- c(list(x, i), f)
y <- do.call(setTable, args)
head(colData(getTable(y, i)))
## DataFrame with 6 rows and 4 columns
## rk instance_id numbers letters
## <character> <integer> <numeric> <character>
## 1 blobs_points 9 0.0439816 n
## 2 blobs_points 7 0.9663845 i
## 3 blobs_points 3 0.5930463 c
## 4 blobs_points 9 0.7523034 s
## 5 blobs_points 4 0.3960685 q
## 6 blobs_points 5 0.5055653 k
To facilitate .zattrs handling, we provide a set of functions to access and modify its contents. Specifically,
axes to access a given element’s dimension names and typesCTdata/name/type to access coordinate transformation componentsrmv/addCT to remove, add, or append coordinate transformationsz <- meta(label(x))
axes(z)
## name type
## 1 y space
## 2 x space
CTdata(z)
## input.axes input.name output.axes output.name type scale
## 1 c("y", ".... yx c("y", ".... global identity
## 2 c("y", ".... yx c("y", ".... scale scale 3, 2
## 3 c("y", ".... yx c("y", ".... translation translation
## 4 c("y", ".... yx c("x", ".... affine affine
## 5 c("y", ".... yx c("y", ".... sequence sequence
## translation affine transformations
## 1
## 2
## 3 -50, 10
## 4 20, 50, ....
## 5 list(axe....
CTname(rmvCT(z, "scale"))
## [1] "global" "translation" "affine" "sequence"
CTname(addCT(z,
name="D'Artagnan",
type="scale",
data=c(19, 94)))
## [1] "global" "scale" "translation" "affine" "sequence"
## [6] "D'Artagnan"
Zattrs specify an explicit relationship between elements and coordinate systems.
We can represent these are a graph as follows:
(g <- SpatialData:::.coord2graph(x))
## A graphAM graph with directed edges
## Number of Nodes = 14
## Number of Edges = 13
graph::plot(g)
The above representation greatly facilitates queries of the transformation(s)
required to spatially align elements. blobs_labels, for example, requires a
sequential transformation (scaling and translation) for the sequence space:
SpatialData:::.get_path(g, "blobs_labels", "sequence")
## [[1]]
## [[1]]$data
## [1] 3 2
##
## [[1]]$type
## [1] "scale"
##
##
## [[2]]
## [[2]]$data
## [1] -50 10
##
## [[2]]$type
## [1] "translation"
Image/LabelArrays are linked to potentially multiscale .zarr stores.
Their show method includes the scales available for a given element:
image(x, "blobs_image")
## class: ImageArray
## Scales (1): (3,64,64)
image(x, "blobs_multiscale_image")
## class: ImageArray (MultiScale)
## Scales (3): (3,64,64) (3,32,32) (3,16,16)
Internally, multiscale ImageArrays are stored as a list of ZarrArray, e.g.:
i <- image(x, "blobs_multiscale_image")
vapply(i@data, dim, numeric(3))
## [,1] [,2] [,3]
## [1,] 3 3 3
## [2,] 64 32 16
## [3,] 64 32 16
To retrieve a specific scale’s ZarrArray, we can use data(., k),
where k specifies the target scale. This also works for plotting:
wrap_plots(nrow=1, lapply(seq(3), \(.)
plotSpatialData() + plotImage(x, i=2, k=.)))
i <- "blobs_labels"
p <- plotSpatialData()
pal_d <- hcl.colors(10, "Spectral")
pal_c <- hcl.colors(9, "Inferno")[-9]
a <- p + plotLabel(x, i) # simple binary image
b <- p + plotLabel(x, i, "instance_id", pal=pal_d) # 'colData'
c <- p + plotLabel(x, i, "channel_1_sum", pal=pal_c) # 'assay'
(a | b | c) +
plot_layout(guides="collect") &
theme(legend.position="bottom")
i <- "blobs_points"
p <- plotSpatialData()
# mock up a 'table'
f <- list(
numbers=\(n) runif(n),
letters=\(n) sample(letters, n, TRUE))
y <- setTable(x, i, f)
# demo. viz. capabilities
a <- p + plotPoint(y, i) # simple dots
b <- p + plotPoint(y, i, "letters") # discrete coloring
c <- p + plotPoint(y, i, "numbers") # continuous coloring
a | b | c
p <- plotSpatialData()
a <- p +
ggtitle("polygons") +
plotShape(x, "blobs_polygons")
b <- p +
ggtitle("multipolygons") +
plotShape(x, "blobs_multipolygons")
c <- p +
ggtitle("circles") +
plotShape(x, "blobs_circles")
wrap_plots(a, b, c)
p <- plotSpatialData()
# joint
all <- p +
plotImage(x) +
plotLabel(x, a=1/3) +
plotShape(x, 1) +
plotShape(x, 3) +
new_scale_color() +
plotPoint(x, c="genes") +
ggtitle("layered")
# split
one <- list(
p + plotImage(x) + ggtitle("image"),
p + plotLabel(x) + ggtitle("labels"),
p + plotShape(x, 1) + ggtitle("circles"),
p + plotShape(x, 3) + ggtitle("polygons"),
p + plotPoint(x, c="genes") + ggtitle("points"))
wrap_plots(c(list(all), one), nrow=2)
Data from a variety of technologies has been made available as SpatialData .zarr stores here
here.
These, in turn, have been deposited in Bioconductor’s NSF Open Storage Network bucket,
and can be retrieved from there with support for caching using BiocFileCache.
We can interrogate the bucket for available (zipped) .zarr archives:
available_spd_zarr_zips()
## [1] "mcmicro_io.zip"
## [2] "merfish.zarr.zip"
## [3] "mibitof.zip"
## [4] "steinbock_io.zip"
## [5] "visium_associated_xenium_io_aligned.zip"
## [6] "visium_hd_3.0.0_io.zip"
Any of the above can be retrieved (once) into some location, and read into R:
dir.create(td <- tempfile())
pa <- unzip_spd_demo(
zipname="merfish.zarr.zip",
dest=td, source="biocOSN")
(x <- readSpatialData(pa))
## class: SpatialData
## - images(1):
## - rasterized (1,522,575)
## - labels(0):
## - points(1):
## - single_molecule (3714642)
## - shapes(2):
## - anatomical (6,polygon)
## - cells (2389,circle)
## - tables(1):
## - table (268,2389)
## coordinate systems:
## - global(4): rasterized anatomical cells single_molecule
In this example data, we do not have a label for the shape polygons.
Such labels could be morphological regions annotated by pathologists.
There are only 2389 cells, but 3,714,642 molecules, so that we downsample a random subset of 2,000 for visualization:
point(x, "2k") <- (p <- point(x))[sample(length(p), 2e3)]
plotSpatialData() +
plotImage(x) +
plotPoint(x, i="2k", c="cell_type", s=0.2) +
new_scale_color() +
plotShape(x, i="anatomical") +
scale_color_manual(values=hcl.colors(6, "Spectral"))
qu <- list(xmin=1800, xmax=2400, ymin=5000, ymax=5400)
bb <- geom_rect(do.call(aes, qu), data.frame(qu), col="yellow", fill=NA)
y <- SpatialData(images=list("."=do.call(query, c(list(x=image(x)), qu))))
plotSpatialData() + plotImage(x) + bb | plotSpatialData() + plotImage(y)
Colorectal carcinoma, 25 MB; no shapes, no points.
dir.create(td <- tempfile())
pa <- unzip_spd_demo(
zipname="mibitof.zip",
dest=td, source="biocOSN")
(x <- readSpatialData(pa))
## class: SpatialData
## - images(3):
## - point16_image (3,1024,1024)
## - point23_image (3,1024,1024)
## - point8_image (3,1024,1024)
## - labels(3):
## - point16_labels (1024,1024)
## - point23_labels (1024,1024)
## - point8_labels (1024,1024)
## - points(0):
## - shapes(0):
## - tables(1):
## - table (36,3309)
## coordinate systems:
## - point16(2): point16_image point16_labels
## - point23(2): point23_image point23_labels
## - point8(2): point8_image point8_labels
pal <- hcl.colors(8, "Spectral")
wrap_plots(nrow=1, lapply(seq(3), \(.)
plotSpatialData() + plotImage(x, .) +
plotLabel(x, ., "Cluster", pal=pal))) +
plot_layout(guides="collect")
Mouse intestine, 1GB; 4 image resolutions and 3 shapes at 2, 8, and 16 \(\mu\)m.
dir.create(td <- tempfile())
pa <- unzip_spd_demo(
zipname="visium_hd_3.0.0_io.zip",
dest=td, source="biocOSN")
(x <- readSpatialData(pa, images=4, shapes=3, tables=NULL))
## class: SpatialData
## - images(1):
## - Visium_HD_Mouse_Small_Intestine_lowres_image (3,558,600)
## - labels(0):
## - points(0):
## - shapes(1):
## - Visium_HD_Mouse_Small_Intestine_square_016um (91033,circle)
## - tables(0):
## coordinate systems:
## - downscaled_lowres(2): Visium_HD_Mouse_Small_Intestine_lowres_image
## Visium_HD_Mouse_Small_Intestine_square_016um
## - global(1): Visium_HD_Mouse_Small_Intestine_square_016um
## - downscaled_hires(1): Visium_HD_Mouse_Small_Intestine_square_016um
qu <- list(xmin=100, xmax=300, ymin=200, ymax=400)
bb <- geom_rect(do.call(aes, qu), data.frame(qu), col="black", fill=NA)
y <- SpatialData(images=list("."=do.call(query, c(list(x=image(x)), qu))))
plotSpatialData() + plotImage(x) + bb | plotSpatialData() + plotImage(y)
Small lung adenocarcinoma, 250 MB; 1 image, 2 labels, 2 tables.
dir.create(td <- tempfile())
pa <- unzip_spd_demo(
zipname="mcmicro_io.zip",
dest=td, source="biocOSN")
(x <- readSpatialData(pa))
## class: SpatialData
## - images(1):
## - exemplar-001_image (12,3139,2511)
## - labels(2):
## - exemplar-001_cell (3139,2511)
## - exemplar-001_nuclei (3139,2511)
## - points(0):
## - shapes(0):
## - tables(2):
## - exemplar-001--ilastik_cell (12,11607)
## - exemplar-001--unmicst_cell (12,11170)
## coordinate systems:
## - global(3): exemplar-001_image exemplar-001_cell exemplar-001_nuclei
4 different cancers (SCCHN, BCC, NSCLC, CRC), 820 MB; 14 images, 14 labels, 1 table.
dir.create(td <- tempfile())
pa <- unzip_spd_demo(
zipname="steinbock_io.zip",
dest=td, source="biocOSN")
(x <- readSpatialData(pa))
## class: SpatialData
## - images(14):
## - Patient1_001_image (40,600,600)
## - Patient1_002_image (40,600,600)
## - Patient1_003_image (40,600,600)
## - Patient2_001_image (40,600,600)
## - Patient2_002_image (40,600,600)
## - Patient2_003_image (40,600,600)
## - Patient2_004_image (40,600,600)
## - Patient3_001_image (40,600,600)
## - Patient3_002_image (40,600,600)
## - Patient3_003_image (40,600,600)
## - Patient4_005_image (40,600,600)
## - Patient4_006_image (40,600,600)
## - Patient4_007_image (40,600,600)
## - Patient4_008_image (40,600,600)
## - labels(14):
## - Patient1_001_labels (600,600)
## - Patient1_002_labels (600,600)
## - Patient1_003_labels (600,600)
## - Patient2_001_labels (600,600)
## - Patient2_002_labels (600,600)
## - Patient2_003_labels (600,600)
## - Patient2_004_labels (600,600)
## - Patient3_001_labels (600,600)
## - Patient3_002_labels (600,600)
## - Patient3_003_labels (600,600)
## - Patient4_005_labels (600,600)
## - Patient4_006_labels (600,600)
## - Patient4_007_labels (600,600)
## - Patient4_008_labels (600,600)
## - points(0):
## - shapes(0):
## - tables(1):
## - table (40,47859)
## coordinate systems:
## - Patient1_001(2): Patient1_001_image Patient1_001_labels
## - Patient1_002(2): Patient1_002_image Patient1_002_labels
## - Patient1_003(2): Patient1_003_image Patient1_003_labels
## - Patient2_001(2): Patient2_001_image Patient2_001_labels
## - Patient2_002(2): Patient2_002_image Patient2_002_labels
## - Patient2_003(2): Patient2_003_image Patient2_003_labels
## - Patient2_004(2): Patient2_004_image Patient2_004_labels
## - Patient3_001(2): Patient3_001_image Patient3_001_labels
## - Patient3_002(2): Patient3_002_image Patient3_002_labels
## - Patient3_003(2): Patient3_003_image Patient3_003_labels
## - Patient4_005(2): Patient4_005_image Patient4_005_labels
## - Patient4_006(2): Patient4_006_image Patient4_006_labels
## - Patient4_007(2): Patient4_007_image Patient4_007_labels
## - Patient4_008(2): Patient4_008_image Patient4_008_labels
## R version 4.4.1 Patched (2024-07-08 r86893)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.2.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/Madrid
## tzcode source: internal
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] Rgraphviz_2.50.0 graph_1.84.0
## [3] Rarr_1.6.0 DelayedArray_0.32.0
## [5] SparseArray_1.6.0 S4Arrays_1.6.0
## [7] abind_1.4-8 Matrix_1.7-1
## [9] SingleCellExperiment_1.28.0 SummarizedExperiment_1.36.0
## [11] Biobase_2.66.0 GenomicRanges_1.58.0
## [13] GenomeInfoDb_1.42.0 IRanges_2.40.0
## [15] S4Vectors_0.44.0 BiocGenerics_0.52.0
## [17] MatrixGenerics_1.19.0 matrixStats_1.4.1
## [19] SpatialData_0.99.17 ggnewscale_0.5.0
## [21] patchwork_1.3.0 ggplot2_3.5.1
## [23] BiocStyle_2.34.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 RBGL_1.82.0 anndataR_0.99.0
## [4] rlang_1.1.4 magrittr_2.0.3 e1071_1.7-16
## [7] compiler_4.4.1 RSQLite_2.3.8 dir.expiry_1.14.0
## [10] paws.storage_0.7.0 png_0.1-8 vctrs_0.6.5
## [13] stringr_1.5.1 wk_0.9.4 pkgconfig_2.0.3
## [16] crayon_1.5.3 fastmap_1.2.0 magick_2.8.5
## [19] dbplyr_2.5.0 XVector_0.46.0 labeling_0.4.3
## [22] paws.common_0.7.7 utf8_1.2.4 rmarkdown_2.29
## [25] UCSC.utils_1.2.0 tinytex_0.54 purrr_1.0.2
## [28] bit_4.5.0 xfun_0.49 zlibbioc_1.52.0
## [31] cachem_1.1.0 jsonlite_1.8.9 blob_1.2.4
## [34] tweenr_2.0.3 parallel_4.4.1 R6_2.5.1
## [37] bslib_0.8.0 stringi_1.8.4 reticulate_1.40.0
## [40] jquerylib_0.1.4 Rcpp_1.0.13-1 bookdown_0.41
## [43] assertthat_0.2.1 knitr_1.49 R.utils_2.12.3
## [46] tidyselect_1.2.1 rstudioapi_0.17.1 yaml_2.3.10
## [49] zellkonverter_1.16.0 curl_6.0.1 lattice_0.22-6
## [52] tibble_3.2.1 basilisk.utils_1.18.0 withr_3.0.2
## [55] evaluate_1.0.1 sf_1.0-19 units_0.8-5
## [58] proxy_0.4-27 polyclip_1.10-7 BiocFileCache_2.14.0
## [61] xml2_1.3.6 pillar_1.9.0 BiocManager_1.30.25
## [64] filelock_1.0.3 KernSmooth_2.23-24 pizzarr_0.1.0
## [67] generics_0.1.3 nanoarrow_0.6.0 munsell_0.5.1
## [70] scales_1.3.0 class_7.3-22 glue_1.8.0
## [73] tools_4.4.1 colorspace_2.1-1 paws_0.7.0
## [76] GenomeInfoDbData_1.2.13 basilisk_1.18.0 ggforce_0.4.2
## [79] cli_3.6.3 fansi_1.0.6 arrow_17.0.0.1
## [82] dplyr_1.1.4 geoarrow_0.2.1 gtable_0.3.6
## [85] R.methodsS3_1.8.2 sass_0.4.9 digest_0.6.37
## [88] classInt_0.4-10 farver_2.1.2 memoise_2.0.1
## [91] htmltools_0.5.8.1 R.oo_1.27.0 lifecycle_1.0.4
## [94] httr_1.4.7 bit64_4.5.2 MASS_7.3-61